{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 作业要求"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用tensorflow，构造并训练一个神经网络，在测试机上达到超过98%的准确率。  \n",
    "需要基础知识：  \n",
    "  深度神经网络  \n",
    "  激活函数  \n",
    "  正则化  \n",
    "  初始化  \n",
    "\n",
    "探索超参数设置：  \n",
    "  隐层数量  \n",
    "  各隐层中神经元数量  \n",
    "  学习率  \n",
    "  正则化因子  \n",
    "  权重初始化分布参数  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 最简模型（感知器）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\h5py\\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  from ._conv import register_converters as _register_converters\n"
     ]
    }
   ],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "import argparse\n",
    "import sys\n",
    "\n",
    "from tensorflow.examples.tutorials.mnist import input_data\n",
    "\n",
    "import tensorflow as tf\n",
    "\n",
    "FLAGS = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调用系统提供的Mnist数据函数为我们读入数据，自行下载数据集也可以（四个压缩包）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting /tmp/tensorflow/mnist/input_data\\train-images-idx3-ubyte.gz\n",
      "Extracting /tmp/tensorflow/mnist/input_data\\train-labels-idx1-ubyte.gz\n",
      "Extracting /tmp/tensorflow/mnist/input_data\\t10k-images-idx3-ubyte.gz\n",
      "Extracting /tmp/tensorflow/mnist/input_data\\t10k-labels-idx1-ubyte.gz\n"
     ]
    }
   ],
   "source": [
    "# Import data\n",
    "data_dir = '/tmp/tensorflow/mnist/input_data'\n",
    "mnist = input_data.read_data_sets(data_dir, one_hot=True)\n",
    "# 导入数据，选择one-hot编码，数据集为10分类问题"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the model\n",
    "x = tf.placeholder(tf.float32, [None, 784])  # 定义placeholder用于后续feed数据\n",
    "W = tf.Variable(tf.zeros([784, 10]))         # 定义变量，初始化权重参数为0，左边接收784维的图像数据，右边输出10维的分类数据\n",
    "b = tf.Variable(tf.zeros([10]))              # 定义变量，初始化偏差参数为0\n",
    "y = tf.matmul(x, W) + b                      # 矩阵运算，得出y输出（类似于单个感知器）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define loss and optimizer\n",
    "y_ = tf.placeholder(tf.float32, [None, 10])  # 定义placeholder用于存放数据的分类真值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From <ipython-input-5-ca50b045414f>:3: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "\n",
      "Future major versions of TensorFlow will allow gradients to flow\n",
      "into the labels input on backprop by default.\n",
      "\n",
      "See tf.nn.softmax_cross_entropy_with_logits_v2.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 使用交叉熵损失作为目标函数，使用softmax作为分类器（计算各分类的概率）\n",
    "# f.nn.softmax_cross_entropy_with_logits（）参数是真值和预测值，函数处理包含softmax和交叉熵，传入的参数y是未经过激活的\n",
    "cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 生成一个step，使用梯度下降法优化模型权重和偏差，学习率为0.5，优化对象为交叉熵损失（目标函数）\n",
    "train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)\n",
    "\n",
    "sess = tf.Session()    # 定义一个session\n",
    "init_op = tf.global_variables_initializer()  # 初始化全局变量\n",
    "sess.run(init_op)      # 执行初始化操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 开始训练，总数据集为6万，每次训练使用一个batch，一个batch大小为100个数据，循环3000次，共3000*100/60000=5 epochs\n",
    "for _ in range(3000):\n",
    "  batch_xs, batch_ys = mnist.train.next_batch(100)\n",
    "  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9216\n"
     ]
    }
   ],
   "source": [
    "# 测试集上查看训练效果\n",
    "correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))    # argmax结果：取最大值的索引号\n",
    "accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))  # tf.cast 类型转换\n",
    "print(sess.run(accuracy, feed_dict={x: mnist.test.images,y_: mnist.test.labels}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*** 由感知器做出来的模型已经可以达到92%左右的准确率 ***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 神经网络模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要对比不同参数的训练效果，使用相同的测试集，故先定义一个测试函数，用于对比各个模型的训练表现情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def pred_accuracy(logists, gamma, batch_size):\n",
    "    # 把初始化全局变量的操作放在函数中，之后每个不同训练模型不用重复写初始化代码\n",
    "    sess.run(tf.global_variables_initializer())  \n",
    "    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=logists))  \n",
    "    train_step = tf.train.GradientDescentOptimizer(gamma).minimize(cross_entropy) \n",
    "    \n",
    "    for i in range(3000):                                             \n",
    "        batch_xs, batch_ys = mnist.train.next_batch(batch_size)                            \n",
    "        sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})\n",
    "        \n",
    "    correct_prediction = tf.equal(tf.argmax(logists,1), tf.argmax(y_,1))      \n",
    "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) \n",
    "    print('accuracy =',sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels}))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分析：当前的感知器模型太简单了，在进行不同参数进行训练对比时，如学习率、batch大小、正则化、权重初始化等，可能这些参数对结果的影响并不明显，比如正则化参数本来就是用来限制模型复杂度，防止过拟合等问题，对于太简单的模型效果应该不明显。目前最需要做的应该是提升模型的复杂度，让模型的参数先丰富起来，在进行这些超参数的调节对比，故第一步先增加隐层和神经元，并以此为基础，进行后续参数调节。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数1： 增加隐层"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "先增加一个隐层，隐层中神经元数目设置为数据原始维数的一半（784/2=392）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy = 0.1135\n"
     ]
    }
   ],
   "source": [
    "W1 = tf.Variable(tf.zeros([784, 392]))   # 784为原始数据的维数，392为隐层中神经元个数，初始化权重为0  \n",
    "b1 = tf.Variable(tf.zeros([392]))        # 392为对应的bias维数              \n",
    "logist1 = tf.matmul(x, W1) + b1          # 计算出logist \n",
    "y1 = tf.nn.relu(logist1)                 # 有很多种激活函数，不过智老师的介绍中，relu是一种很常用的激活函数，这里暂时先用这个\n",
    "\n",
    "W2 = tf.Variable(tf.zeros([392, 10]))    # 隐层向输出层转化，392为上层输入的神经元个数，10为输出的维数   \n",
    "b2 = tf.Variable(tf.zeros([10]))         # 对应10维的bias\n",
    "logist2 = tf.matmul(y1, W2) + b2          # 计算出logist，此时不进行激活，因为即将调用的softmax-交叉熵函数有自带激活\n",
    "\n",
    "pred_accuracy(logists=logist2, gamma=0.5, batch_size=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "0.1135是什么鬼啊，怎么跟随机猜测的概率差不多(1/10)......看群里小伙伴不是说随便加个隐层准确率就蹭蹭蹭上去了么，大骗子"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "后来发现，神经网络中的权重不能初始化为零。  \n",
    "在单个感知器中，权重初始化为零，计算后直接就输出和真值对比，然后梯度下降来调整权重，一开始应该准确率提升的很快  \n",
    "在有隐层的神经网络中，若权重初始化为零，则前向传播时，隐层中神经元的值都是零，这样在计算梯度下降时无论隐层中有多少神经元，其实都是在进行相同的计算，而根据BP四项原则的计算公式，cost对神经网络中的权重求偏导时，结果也应该是0，相当于没有训练  \n",
    "所以下一步先选择初始化权重方式"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数2： 权重初始化方式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "权重初始化：截断正态分布\n",
      "accuracy = 0.9795\n",
      "权重初始化：随机正态分布\n",
      "accuracy = 0.9778\n",
      "权重初始化：随机均匀分布\n",
      "accuracy = 0.2125\n"
     ]
    }
   ],
   "source": [
    "W1_1 = tf.Variable(tf.truncated_normal([784,392],stddev=0.05))   # 截断正态分布 \n",
    "W1_2 = tf.Variable(tf.random_normal([784,392],stddev=0.05))      # 随机正态分布\n",
    "W1_3 = tf.Variable(tf.random_uniform([784,392]))                 # 随机均匀分布\n",
    "b1 = tf.Variable(tf.zeros([392]))                                # 偏差一般都是初始化为0\n",
    "\n",
    "logist1_1 = tf.matmul(x, W1_1) + b1\n",
    "logist1_2 = tf.matmul(x, W1_2) + b1 \n",
    "logist1_3 = tf.matmul(x, W1_3) + b1 \n",
    "\n",
    "y1_1 = tf.nn.relu(logist1_1)\n",
    "y1_2 = tf.nn.relu(logist1_2)\n",
    "y1_3 = tf.nn.relu(logist1_3)\n",
    "\n",
    "W2_1 = tf.Variable(tf.truncated_normal([392, 10],stddev=0.05)) \n",
    "W2_2 = tf.Variable(tf.random_normal([392, 10],stddev=0.05)) \n",
    "W2_3 = tf.Variable(tf.random_uniform([392, 10])) \n",
    "b2 = tf.Variable(tf.zeros([10]))       \n",
    "\n",
    "logist2_1 = tf.matmul(y1_1, W2_1) + b2\n",
    "logist2_2 = tf.matmul(y1_2, W2_2) + b2 \n",
    "logist2_3 = tf.matmul(y1_3, W2_3) + b2 \n",
    "\n",
    "print(\"权重初始化：截断正态分布\")\n",
    "pred_accuracy(logists=logist2_1, gamma=0.5, batch_size=100)\n",
    "print(\"权重初始化：随机正态分布\")\n",
    "pred_accuracy(logists=logist2_2, gamma=0.5, batch_size=100)\n",
    "print(\"权重初始化：随机均匀分布\")\n",
    "pred_accuracy(logists=logist2_3, gamma=0.5, batch_size=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "果然还是权重初始化的问题，瞬间准确率就提升了，已经达到将近98%。以上三种方式，最适合这个数据集的初始化方式是正态分布（高斯分布），而随机均匀分布的效果和初始化为0的效果类似，也是造成了所有隐层神经元值一样，梯度下降优化时没有用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定一个单隐层的神经网络，并进行初始化\n",
    "def one_hidden_layer_init(num_neural):\n",
    "    W1 = tf.Variable(tf.truncated_normal([784,num_neural],stddev=0.05))\n",
    "    b1 = tf.Variable(tf.zeros([num_neural]))                   \n",
    "    logist1 = tf.matmul(x, W1) + b1           \n",
    "    y1 = tf.nn.relu(logist1)                \n",
    "\n",
    "    W2 = tf.Variable(tf.truncated_normal([num_neural, 10],stddev=0.05))   \n",
    "    b2 = tf.Variable(tf.zeros([10]))        \n",
    "    logist2 = tf.matmul(y1, W2) + b2   \n",
    "    \n",
    "    return logist2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数3： 神经元个数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "num of neural is: 1\n",
      "accuracy = 0.3307\n",
      "num of neural is: 2\n",
      "accuracy = 0.4494\n",
      "num of neural is: 100\n",
      "accuracy = 0.9737\n",
      "num of neural is: 392\n",
      "accuracy = 0.9811\n",
      "num of neural is: 784\n",
      "accuracy = 0.9788\n",
      "num of neural is: 1000\n",
      "accuracy = 0.981\n",
      "num of neural is: 1500\n",
      "accuracy = 0.9808\n"
     ]
    }
   ],
   "source": [
    "num_neurals = [1,2,100,392,784,1000,1500]\n",
    "for neural in num_neurals:\n",
    "    logist = one_hidden_layer_init(neural)\n",
    "    print(\"num of neural is: %d\" %neural)\n",
    "    pred_accuracy(logists=logist, gamma=0.5, batch_size=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随着神经元数量的增加，训练效果提升的很快，在神经元数量为392时已经达到98.11%的准确率，而后续在增加神经元数量已经不怎么提升准确率了，不知道是不是已经有过拟合现象，或者是训练效果提升的瓶颈已经不在这里了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里感觉有点意外，因为初始化权重W1和W2是放在一个函数one_hidden_layer_init()里面的，返回值为logist，返回值作为下一个函数pred_accuracy的输入值，如果是在C语言里面，W1和W2都属于one_hidden_layer_init()的局部变量，生命周期仅限于该函数内，下一个函数通过梯度优化迭代上一个函数中的参数W和b是做不到的。。。。而这里利用python和TensorFlow就可以做到，有点神奇"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数4:  batch_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "batch = 50:\n",
      "accuracy = 0.9773\n",
      "batch = 100:\n",
      "accuracy = 0.9793\n",
      "batch = 200:\n",
      "accuracy = 0.9809\n",
      "batch = 500:\n",
      "accuracy = 0.9808\n",
      "batch = 1000:\n",
      "accuracy = 0.9805\n",
      "batch = 2000:\n",
      "accuracy = 0.9797\n"
     ]
    }
   ],
   "source": [
    "logist = one_hidden_layer_init(num_neural=392)\n",
    "\n",
    "batchs = [50,100,200,500,1000,2000]\n",
    "for batch in batchs:\n",
    "    print(\"batch = %d:\" %batch)\n",
    "    pred_accuracy(logists=logist, gamma=0.5, batch_size=batch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "batch大小在200的时候达到最好的效果，达到98.09%，太小的话每次训练随机性比较大，类似于训练模型是给的数据集太少造成欠拟合，batch太大的话训练速度非常慢，而且每个step可能训练到后面会重复，所以后续的训练选择batch_size=200"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数5:  学习率gamma"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gamma = 0.010000:\n",
      "accuracy = 0.9145\n",
      "gamma = 0.100000:\n",
      "accuracy = 0.9661\n",
      "gamma = 0.300000:\n",
      "accuracy = 0.9776\n",
      "gamma = 0.600000:\n",
      "accuracy = 0.981\n",
      "gamma = 0.800000:\n",
      "accuracy = 0.9829\n",
      "gamma = 1.000000:\n",
      "accuracy = 0.9801\n",
      "gamma = 5.000000:\n",
      "accuracy = 0.101\n",
      "gamma = 10.000000:\n",
      "accuracy = 0.0958\n"
     ]
    }
   ],
   "source": [
    "logist = one_hidden_layer_init(num_neural=392)\n",
    "\n",
    "gammas = [0.01,0.1,0.3,0.6,0.8,1,5,10]\n",
    "for gamma0 in gammas:\n",
    "    print(\"gamma = %f:\" %gamma0)\n",
    "    pred_accuracy(logists=logist, gamma=gamma0, batch_size=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当学习率为0.8的时候，达到最好的训练效果98.29%，学习率超过1之后，训练效果骤降，应该是已经远远偏离了最优解了。不过在实际训练中，学习率应该设置为一个动态的值，随着迭代次数增加逐渐递减，这样可以使训练前期更快收敛，训练后期也不会overshot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数6:  激活函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy = 0.9818\n",
      "accuracy = 0.9647\n",
      "accuracy = 0.9787\n"
     ]
    }
   ],
   "source": [
    "W1 = tf.Variable(tf.truncated_normal([784,392],stddev=0.05))\n",
    "b1 = tf.Variable(tf.zeros([392]))                   \n",
    "logist1 = tf.matmul(x, W1) + b1           \n",
    "            \n",
    "W2 = tf.Variable(tf.truncated_normal([392, 10],stddev=0.05))   \n",
    "b2 = tf.Variable(tf.zeros([10])) \n",
    "\n",
    "y1_1 = tf.nn.relu(logist1)\n",
    "y1_2 = tf.nn.sigmoid(logist1)\n",
    "y1_3 = tf.nn.tanh(logist1)\n",
    "y1s = [y1_1, y1_2, y1_3]\n",
    "\n",
    "for y in y1s:\n",
    "    logist2 = tf.matmul(y, W2) + b2 \n",
    "    pred_accuracy(logists=logist2, gamma=0.8, batch_size=200)\n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "发现激活函数选择relu的效果是最好的，最适合这个数据集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***到目前为止，已经初步确定了几个能使训练结果达到98%左右的参数，下面其他参数的调节和模型训练将使用以下参数***  \n",
    "权重初始化：截断正态分布  \n",
    "神经元个数：392个  \n",
    "batch_size：200  \n",
    "激活函数：  ReLu  \n",
    "学习率：   0.8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数7:  隐层数量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "两个隐层：\n",
      "accuracy = 0.9824\n"
     ]
    }
   ],
   "source": [
    "# 两个隐层\n",
    "W1 = tf.Variable(tf.truncated_normal([784,392],stddev=0.05))\n",
    "b1 = tf.Variable(tf.zeros([392]))                   \n",
    "logist1 = tf.matmul(x, W1) + b1           \n",
    "y1 = tf.nn.relu(logist1)    \n",
    "\n",
    "W2 = tf.Variable(tf.truncated_normal([392,392],stddev=0.05))\n",
    "b2 = tf.Variable(tf.zeros([392]))                   \n",
    "logist2 = tf.matmul(y1, W2) + b2          \n",
    "y2 = tf.nn.relu(logist2)  \n",
    "\n",
    "W3 = tf.Variable(tf.truncated_normal([392, 10],stddev=0.05))   \n",
    "b3 = tf.Variable(tf.zeros([10]))        \n",
    "logist3 = tf.matmul(y2, W3) + b3 \n",
    "\n",
    "print(\"两个隐层：\")\n",
    "pred_accuracy(logists=logist3, gamma=0.8, batch_size=200)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "三个隐层：\n",
      "accuracy = 0.9818\n"
     ]
    }
   ],
   "source": [
    "# 三个隐层\n",
    "W1 = tf.Variable(tf.truncated_normal([784,392],stddev=0.05))\n",
    "b1 = tf.Variable(tf.zeros([392]))                   \n",
    "logist1 = tf.matmul(x, W1) + b1           \n",
    "y1 = tf.nn.relu(logist1)    \n",
    "\n",
    "W2 = tf.Variable(tf.truncated_normal([392,392],stddev=0.05))\n",
    "b2 = tf.Variable(tf.zeros([392]))                   \n",
    "logist2 = tf.matmul(y1, W2) + b2          \n",
    "y2 = tf.nn.relu(logist2)  \n",
    "\n",
    "W3 = tf.Variable(tf.truncated_normal([392,392],stddev=0.05))\n",
    "b3 = tf.Variable(tf.zeros([392]))                   \n",
    "logist3 = tf.matmul(y2, W3) + b3         \n",
    "y3 = tf.nn.relu(logist3)  \n",
    "\n",
    "W4 = tf.Variable(tf.truncated_normal([392, 10],stddev=0.05))   \n",
    "b4 = tf.Variable(tf.zeros([10]))        \n",
    "logist4 = tf.matmul(y3, W4) + b4 \n",
    "\n",
    "print(\"三个隐层：\")\n",
    "pred_accuracy(logists=logist4, gamma=0.8, batch_size=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1个隐层准确率：98.29%  \n",
    "2个隐层准确率：98.24%  \n",
    "3个隐层准确率：98.18%  \n",
    "可以看出三种情况训练效果差不多，可能是这个数据集本身就比较单一，从单个感知器的训练结果为92%可以看出，不需要复杂的网络就可以达到较高的准确率。而1个隐层的准确率更高一点，可能是因为两个隐层和三个隐层（神经元个数都是392）已经过拟合了，效果反而下降一些"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数8:  是否有正则项&正则因子"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了使代码不那么庞大，这里选择单隐层神经网络调参  \n",
    "\n",
    "先定义一个处理有L1和L2正则项的预测函数，参数为：  \n",
    "logists: 未激活的y  \n",
    "W1:输入层与隐层之间的权重  \n",
    "W2: 隐层与输出层之间的权重  \n",
    "alpha：正则因子，决定惩罚力度  \n",
    "L:L1或L2正则  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "def pred_nn_regularization(logists,W1,W2,alpha,L):\n",
    "    sess.run(tf.global_variables_initializer()) \n",
    "    \n",
    "    if L=='L1': \n",
    "        #cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=logists))\n",
    "        #tf.add_to_collection(\"losses\",cross_entropy)                               \n",
    "        #tf.add_to_collection(\"losses\",tf.contrib.layers.l1_regularizer(0.0001)(W1)) \n",
    "        #tf.add_to_collection(\"losses\",tf.contrib.layers.l1_regularizer(0.0001)(W2)) \n",
    "        #loss = tf.add_n(tf.get_collection(\"losses\")[1:])          \n",
    "        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=logists)\\\n",
    "                                     +tf.contrib.layers.l1_regularizer(alpha)(W1)\\\n",
    "                                     +tf.contrib.layers.l1_regularizer(alpha)(W2))    \n",
    "        #print(\"L = L1:\")\n",
    "    elif L=='L2':\n",
    "        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=logists)\\\n",
    "                                      +tf.contrib.layers.l2_regularizer(alpha)(W1)\\\n",
    "                                      +tf.contrib.layers.l2_regularizer(alpha)(W2)) \n",
    "        #print(\"L = L2:\")\n",
    "    \n",
    "    train_step = tf.train.GradientDescentOptimizer(0.8).minimize(cross_entropy) \n",
    "    for i in range(3000):                                             \n",
    "        batch_xs, batch_ys = mnist.train.next_batch(200)                            \n",
    "        sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})\n",
    "        \n",
    "    correct_prediction = tf.equal(tf.argmax(logists,1), tf.argmax(y_,1))      \n",
    "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) \n",
    "    print('accuracy =',sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels}))   \n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### L1正则"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "L1: alpha = 0.000010\n",
      "accuracy = 0.9805\n",
      "L1: alpha = 0.000100\n",
      "accuracy = 0.9803\n",
      "L1: alpha = 0.001000\n",
      "accuracy = 0.9719\n",
      "L1: alpha = 0.010000\n",
      "accuracy = 0.9275\n",
      "L1: alpha = 0.100000\n",
      "accuracy = 0.8867\n"
     ]
    }
   ],
   "source": [
    "alphas = [0.00001,0.0001,0.001,0.01,0.1]\n",
    "\n",
    "for alpha in alphas:  \n",
    "    W1 = tf.Variable(tf.truncated_normal([784,392],stddev=0.05))\n",
    "    b1 = tf.Variable(tf.zeros([392]))                   \n",
    "    logist1 = tf.matmul(x, W1) + b1           \n",
    "    y1 = tf.nn.relu(logist1)                \n",
    "\n",
    "    W2 = tf.Variable(tf.truncated_normal([392, 10],stddev=0.05))   \n",
    "    b2 = tf.Variable(tf.zeros([10]))        \n",
    "    logist2 = tf.matmul(y1, W2) + b2   \n",
    "    \n",
    "    print(\"L1: alpha = %f\" %alpha)\n",
    "    pred_nn_regularization(logists=logist2,W1=W,W2=W2,alpha=alpha,L='L1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### L2正则"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "L1: alpha = 0.000010\n",
      "accuracy = 0.9809\n",
      "L1: alpha = 0.000100\n",
      "accuracy = 0.9806\n",
      "L1: alpha = 0.001000\n",
      "accuracy = 0.9652\n",
      "L1: alpha = 0.010000\n",
      "accuracy = 0.9349\n",
      "L1: alpha = 0.100000\n",
      "accuracy = 0.9109\n"
     ]
    }
   ],
   "source": [
    "alphas = [0.00001,0.0001,0.001,0.01,0.1]\n",
    "\n",
    "for alpha in alphas:  \n",
    "    W1 = tf.Variable(tf.truncated_normal([784,392],stddev=0.05))\n",
    "    b1 = tf.Variable(tf.zeros([392]))                   \n",
    "    logist1 = tf.matmul(x, W1) + b1           \n",
    "    y1 = tf.nn.relu(logist1)                \n",
    "\n",
    "    W2 = tf.Variable(tf.truncated_normal([392, 10],stddev=0.05))   \n",
    "    b2 = tf.Variable(tf.zeros([10]))        \n",
    "    logist2 = tf.matmul(y1, W2) + b2   \n",
    "    \n",
    "    print(\"L1: alpha = %f\" %alpha)\n",
    "    pred_nn_regularization(logists=logist2,W1=W,W2=W2,alpha=alpha,L='L1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "增加L1和L2正则，两者都是在步长为0.00001的是达到最好的训练效果，但还是比没有增加正则项的差一点，正则项主要是用来限制模型复杂度，缓解过拟合的，这样的结果表明，该数据集和两层神经模型的训练并不希望限制复杂度，可能还是因为数据集太简单了，增加了正则化反而不好。而且可以看出随着正则化强度的加大，准确率一直在降低"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "小结：  \n",
    "1、单个感知器由于模型太简单了，训练结果应该是有点欠拟合的，但是也能达到92%左右的正确率，说明这个数据集确实比较简单，不愧是深度学习界的HelloWorld  \n",
    "2、增加一个隐层对提升训练结果的影响特别大，但后续再增加，增益效果就不明显了  \n",
    "3、神经网络的权重初始化不能是0或者全是相同的值  \n",
    "4、当前调节的最优的参数：权重初始化（截断正态分布）、神经元个数（392个）、batch_size（200）、激活函数（ReLu）、学习率（0.8）、隐层数目（1）、不需要正则项。。。。得到的最高准确率为98.29%  \n",
    "5、本次作业的调优还有些不足的地方——A.函数可以进一步封装的更好些，减少重复代码；B.参数的调优基本是逐个确定的，最后的正确率98.29%应该不是全连接神经网络的最优值（当然卷积神经网络另说了），如果能进行多个参数联调，可能就能进一步提高准确率，类似于GridSearchCV一样  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
